Introduction

This project examines the tweets about seven popular consumer brands: Disney, McDonalds, Microsoft, Nintendo, Samsung, Sony and Starbucks. It analyzes the trends of the users that tweeted these brands online, and the content associated with each brand.

Methodology

559666 tweets containing at least one of the keywords “Disney,” “McDonalds,” “Microsoft,” “Nintendo,” “Samsung,” “Sony” and “Starbucks” were downloaded from a Python API between August 6 2015 and August 8 2015. The raw tweets were subsequently converted from JSON to CSV format through Python, retaining only 10 selected fields such as the text, Twitter username, language, location and the number of followers.

Next, a multi-step data manipulation was conducted in R to process the csv file. Tweets that contained more than one keyword in their text were removed from the data frame, leaving 556519 tweets in the data set. Because there were 86248 different locations in the original data set, majority tweets were manually relabeled using regular expression and assigned to a country based on their city or state. Many tweets did not disclose the users’ location in the original data set, and ambiguous locations that did not include a country, state or city were relabeled with an empty field, such as “Worldwide,” “Everywhere” or “Your Phone.” As a result, 232118 tweets have an empty location field.

396207 tweets in the English language were subsetted to a different data frame. In the first treatment, the tweets were tagged with positive and/or negative sentiments according to dictionaries adapted from the Harvard General Inquirer. Tweets that were not tagged with any sentiment were labeled as neutral. Tweets that contained both positive and negative sentiments in the same text were removed, leaving 170625 tweets for sentiment analysis. In the second treatment, the English tweets were subsetted into their respective brand, and converted into term document matrices after removing stop words and numbers from their text.The term document matrices for all seven brands were saved for text mining.

Trend Analysis

Disney is the most tweeted brand, followed by Samsung, Microsoft and Sony. These four brands make up 78% of the Twitter buzz. This is unsurprising because Disney, Samsung and Sony all have wide product and media service offerings, while Microsoft released Windows 10, its new operating system, just a week before the tweets were collected. On the other hand, McDonald’s is the least tweeted brand, consisted only half the tweets of its coffee rival Starbucks. This is not unexpected because McDonald’s has been reported on multiple occassions for its failed Twitter campaigns. It is also possible that consumers tweet about its products, such as the McNuggets or the Big Mac, without directly mentioning the brand.

Majority tweets originated from the United States, where they are evenly distributed among Disney, Samsung, Microsoft and Sony. Among the top 10 non-US countries, Disney is the most tweeted brand in the United Kingdom, Brazil, Canada, Mexico and Argentina. Nintendo receive significant mentions only in Japan, its native country, and the United Kingdom. Samsung is the dominant brand in Indonesia, where the company has recently invested new manufacturing operations. Majority tweets in Russia are about the electronic brands, while Starbucks has notable mentions only in the United Kingdom, Canada and Mexico.

Brand sentiment analysis reveals that Disney has a 1:1 ratio of positive to negative tweets. Microsoft, Starbucks and Nintendo have higher proportion of positive tweets than negative tweets, with the latter achieving a 2:1 positive-to-negative ratio. On the other hand, Samsung, Sony and McDonald’s have higher proportion of negative tweets than positive tweets. In particular, despite Samsung’s popularity, the brand has nearly twice as many negative mentions than positive mentions on Twitter.

Examination of how the brand sentiments distribute across the globe shows that the United States, the United Kingdom, India, Japan and Mexico have higher proportion of negative tweets than positive tweets. Nigeria and France are the only two nations that have significant proportions of positive tweets.

Among the seven brands, Disney has the largest distribution of Twitter followers and friends. The 50th to 75th percentile users appear to be very well connected, and the median user has about 450 followers, and follows about 450 users. In other words, users that tweeted Disney tend to have more followers and followed more people on Twitter than users that tweeted other brands. However, users that tweeted the electronic brands all have significantly higher total tweet counts than those who tweeted Disney, Starbucks and McDonalds. In particular, the 75th percentile users of those four brands have about 150000 total tweet counts each. It is possible that there are many spambots on Twitter promoting the products and services of electronic companies, thereby inflating their tweet counts. Users that tweeted Starbucks and McDonalds have similar distribution and much lower number of total tweet counts, suggesting that majority of them are real people.

Comparison of the users reveal that even the outliers have significantly higher number of followers than number of friends. A large number of users have more than 500000 total tweet counts and tend to tweet about the electronic brands, but they have few Twitter followers or friends. By contrast, users that boast high number of followers (1.5 millions or greater) or high number of friends (125000 or greater) are diverse in the brands that they tweeted, and generally have fewer than 500000 total tweet counts.

User and tweet that has the highest number of followers:

##                                                                                                                       text
## 1: RT @KrisSanchez: So... I just got a Cotton Candy Frappuccino from Starbucks. Yeah, it's a thing. http://t.co/kuSexRbRNn
##                      time_created retweet_counts language screen_name
## 1: Sat Aug 08 02:20:24 +0000 2015              0       en   UberFacts
##    followers friends statuses locations     brand
## 1:  11529387       1    94700           Starbucks

User and tweet that has the highest number of friends:

##                                                                                text
## 1: Microsoft libera primeira atualização do Windows 10 http://t.co/iWd84K2dyB #tech
##                      time_created retweet_counts language screen_name
## 1: Thu Aug 06 22:27:30 +0000 2015              0       pt  ajcampos01
##    followers friends statuses locations     brand
## 1:   1214415 1206790    40470    Brazil Microsoft

User and tweet that has the highest total tweet counts:

##                                                                                                                                            text
## 1: RT: @clave4punto0 :¿No dio la talla? Samsung deberá bajar precios de celulares para competir contra Apple y Huawei http://t.co/8TbT4A58eo vi
##                      time_created retweet_counts language screen_name
## 1: Fri Aug 07 17:01:52 +0000 2015              0       es     notiven
##    followers friends statuses locations   brand
## 1:     27516     207  5249759 Venezuela Samsung

Text mining

Word networks of each brand reveal how the frequent terms are associated together among the tweets. It is apparent to see terms within the network clusters that are unique to the brands’ products or services. For example, Sony has a cluster of words about its music service, the English band One Direction and their newest hit song Drag Me Down, and another cluster about its Playstation console system. Likewise, Microsoft has a network cluster about its Xbox console system, a cluster about its tablet Surface Pro, and another cluster about its Windows operating system. Interestingly, McDonald’s word network does not show its food or beverage products. Instead, it has a significant cluster from the tweet that was possibly retweeted by multiple users: “the founding fathers, who all barely washed their [profanity], wanted me to have an assault rifle in this mcdonalds.” This further suggests that McDonalds’ consumers do not mention its brand name when they tweet about its burgers, fries and drinks.

Hierarchical clustering of the frequent terms show similar clusters to the word networks above. In particular, the clusters containing ‘ebay,’ ‘full’ and ‘read’ commonly appear in Samsung, Microsoft, Sony and Nintendo’s dendrograms. It is likely that many tweets associated with these four brands were about selling or trading their electronic products, because consumers that sell their items on eBay typically insert “Full read by eBay” in their tweets.

Brand Word Clouds

The following are word clouds created from frequently used terms that are associated with each brand: